Action selection for stochastic, delayed reward
نویسنده
چکیده
The paper gives a novel account of quick decision making for maximising delayed reward in a stochastic world. The approach rests on observable operator models of stochastic systems, which generalize hidden Markov models. A particular kind of decision situations is outlined, and an algorithm is presented which allows to estimate the probability of future reward with a computational cost of only O(im), where i is the number of action alternatives and m is the model dimension.
منابع مشابه
Best Action Selection in a Stochastic Environment
We study the problem of selecting the best action from multiple candidates in a stochastic environment. In such a stochastic setting, when taking an action, a player receives a random reward and affords a random cost, which are drawn from two unknown distributions. We target at selecting the best action, the one with the maximum ratio of the expected reward to the expected cost, after exploring...
متن کاملMODIFIED ACTION VALUE METHOD APPLIED TO ‘n’-ARMED BANDIT PROBLEMS USING REINFORCEMENT LEARNING
Reinforcement Learning (RL) is an area of Artificial Intelligence (AI) concerned with how an agent should take actions in a stochastic environment so as to optimize a cumulative reward signal. This paper investigates a modified approach to action value methods used to solve n-armed bandit problems where one faces repeatedly with a choice among n different options. The selection of the action ma...
متن کاملReward-Risk Portfolio Selection and Stochastic Dominance
The portfolio selection problem is traditionally modelled by two different approaches. The first one is based on an axiomatic model of risk-averse preferences, where decision makers are assumed to possess an expected utility function and the portfolio choice consists in maximizing the expected utility over the set of feasible portfolios. The second approach, first proposed by Markowitz (1952), ...
متن کاملUncertain Delayed Renewal Reward Process and Its Applications
Uncertain process is a sequence of uncertain variables indexed by time. This paper aims to introduce a kind of uncertain process named uncertain delayed renewal reward process whose interarrival times and rewards (or costs) are regarded as uncertain variables with the first interarrival ( i.e., renewal) time and reward different from the others, respectively. The main results include the uncert...
متن کاملSecond Order Stochastic Dominance , Reward - Risk Portfolio Selection and the CAPM
Starting from the reward-risk model for portfolio selection introduced in De Giorgi (2004), we derive the reward-risk Capital Asset Pricing Model (CAPM) analogously to the classical mean-variance CAPM. The reward-risk portfolio selection arises from an axiomatic definition of reward and risk measures based on few basic principles, including consistency with second order stochastic dominance. Wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999